Skip to content

fix(agent): 强化 prompt 可靠性——结果验证、budget 语义、安全边界与认识论纪律#1354

Merged
CodFrm merged 4 commits intorelease/v1.4-agentfrom
v1.4-agent-prompt-change-01
Apr 23, 2026
Merged

fix(agent): 强化 prompt 可靠性——结果验证、budget 语义、安全边界与认识论纪律#1354
CodFrm merged 4 commits intorelease/v1.4-agentfrom
v1.4-agent-prompt-change-01

Conversation

@cyfung1031
Copy link
Copy Markdown
Collaborator

背景

在对 agent 行为的实际观察中,发现四类系统性失效模式:

  1. 主 agent 盲目消费 sub-agent 结果——即使结果包含失败信息,也被静默拼入最终输出,错误静默传播。
  2. Sub-agent 因误解 budget 语义而过早放弃——现有措辞与主 agent 的 50 轮限制框架一致,导致 sub-agent 误认为 budget 紧张而提前终止本可完成的子任务。
  3. 不可逆操作清单缺少 userscript 场景——安装/修改 userscript 与提交表单同属不可逆高风险操作,但未被纳入确认前置流程。
  4. Sub-agent 静默猜测,不区分已确认事实与推断——主 agent 拿到的结果中,事实、推断、缺口混在一起,无法正确决策。

此外,compact 摘要器在长对话中频繁丢失用户中途修正指令,导致恢复后 agent 重复已纠正的错误;并行 sub-agent 有依赖关系时,下游 agent 在上游未成功的情况下也会静默继续执行。


变更内容

system_prompt.ts

新增:sub-agent 结果接收规范

SECTION_SUB_AGENT### Anti-Patterns 之后插入 ### Receiving Sub-Agent Results 段,明确要求:

  • 收到结果后先检查 Issues 字段,有问题则显式决策(重试 / 换 agent / 上报用户),不得静默并入
  • 部分完成 ≠ 成功,需作为部分失败处理
  • 合并多个 sub-agent 结果前须逐个独立校验

新增:并行任务 fallback 指引

### Writing Sub-Agent Prompts 末尾追加:若 sub-agent 依赖上游输出(如 OPFS 文件),必须在委托 prompt 中写明输入缺失时的 fallback 行为,不得假设上游已成功。

扩展:不可逆操作确认清单

SECTION_SAFETY 第一条在 posting content 后追加 installing or modifying userscripts,并说明原因:userscript 安装后在所有匹配页面持续运行,安装前须展示 @match 模式与功能摘要供用户确认。

sub_agent_types.ts

SUB_AGENT_SECTION_TOOL_USAGE:budget 语义修正

将旧措辞:

You have a limited number of tool calls. Use them wisely…

替换为:

Your budget covers this subtask only — it is independent of the parent agent's budget. … Do not conserve budget by skipping verification steps or giving up prematurely.

明确 sub-agent budget 仅针对当前子任务,与主 agent 独立,消除"省 budget"导致的过早放弃行为。

researcher.systemPromptAddition:置信度分层输出

末尾追加规范:输出中须区分三类信息——

  • 已确认事实(前缀来源:"Source X states…")
  • 推断(显式标注:"Based on the above, it appears…")
  • 缺口(明确说明:"I could not confirm…")

不得将三者混入单一叙述,主 agent 需要可区分的置信度信号才能正确决策。

page_operator.systemPromptAddition:动作与结果分离

末尾追加规范:「点击了提交按钮」与「表单已成功提交」是两个不同事实。每次操作后须通过 get_tab_contentexecute_script 验证结果,无法确认时如实说明,不得推断为成功。

general.systemPromptAddition:选择透明度与失败诚实

末尾追加规范:存在多种可行方案时须简述取舍理由;方案失败时报告为失败,不得包装成"部分成功"。

compact_prompt.ts

buildCompactUserPrompt:中途修正指令优先级

Section 3 User Messages 改为:

  • 明确标注"中途修正为最高优先级"
  • 要求逐字记录用户在操作过程中插入的修正指令(如"停下"、"换个方法")
  • 注明后果:这类消息在长对话中最容易丢失,恢复后 agent 将重复已纠正的错误

影响范围

文件 改动性质
system_prompt.ts 纯字符串修改,无类型/逻辑变更
sub_agent_types.ts 纯字符串修改,无类型/逻辑变更
compact_prompt.ts 纯字符串修改,无类型/逻辑变更
system_prompt.test.ts 新增断言覆盖本次所有改动,存量测试不变

所有改动均为 prompt 文本,不涉及架构调整、新 agent 类型或 TypeScript 运行时逻辑。


测试

新增断言已覆盖本次所有文本变更,包括:

  • ### Receiving Sub-Agent Results 段及关键措辞
  • budget 新措辞(covers this subtask only);旧措辞(Use them wisely)已断言不再出现
  • Safety 段 installing or modifying userscripts@match 说明
  • compact prompt Section 3 的 Mid-task corrections are highest priority 及逐字记录要求
  • researcher 三级置信度标注规范
  • page_operator 动作/结果分离规范
  • general tradeoff 透明度及失败诚实规范
  • fallback 指引措辞

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 通过更新主 agent / sub-agent / compact 总结器的提示词文本与测试断言,强化 agent 编排可靠性(子代理结果验收、budget 语义、不可逆操作边界、以及输出置信度与验证纪律),以减少“静默失败传播”和“未验证即宣称成功”等系统性失效模式。

Changes:

  • 在主 agent prompt 中补充子代理结果验收规范、依赖任务的 fallback 指引,并扩展不可逆操作确认清单(加入 userscript 场景)。
  • 在 sub-agent prompt 中修正 tool-call budget 语义,并为 researcher/page_operator/general 三类角色追加输出纪律(置信度分层、动作/结果分离、取舍透明与失败诚实)。
  • 更新 system prompt 相关测试,新增断言覆盖本次大部分文本变更。

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/app/service/agent/core/system_prompt.ts 主/子代理系统提示词更新:safety 增强、子代理结果接收规范、依赖任务 fallback、sub-agent budget 语义修正。
src/app/service/agent/core/system_prompt.test.ts 为 system prompt 与 sub-agent system prompt 新增断言,覆盖新增段落与关键措辞。
src/app/service/agent/core/sub_agent_types.ts 为 researcher/page_operator/general 追加更严格的输出与验证规范(置信度、结果验证、失败诚实等)。
src/app/service/agent/core/compact_prompt.ts 强化 compact 总结器对“中途修正指令”记录的优先级与逐字记录要求。

const SECTION_SAFETY = `## Safety

- **Confirm before irreversible actions**: submitting forms, making purchases, deleting data, posting content.
- **Confirm before irreversible actions**: submitting forms, making purchases, deleting data, posting content, installing or modifying userscripts. For userscripts specifically, show the script's \`@match\` patterns and a summary of what it does before installing — a userscript runs on every matching page after installation and cannot be easily recalled.
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

主 Agent 的 Safety 段新增了 userscript 的不可逆操作确认要求,但子代理 prompt 里的 SUB_AGENT_SECTION_SAFETY 仍只覆盖表单/购买/删除/发帖等场景。因为实际执行点击/安装流程更可能发生在 page_operator 子代理里,这里缺口会导致子代理在收到“安装脚本”任务时缺少同等级的安全约束。建议同步把 “installing or modifying userscripts” 纳入子代理 Safety,并明确:子代理无法 ask_user 时应停止并把 @match + 功能摘要回报给父代理等待确认。

Suggested change
- **Confirm before irreversible actions**: submitting forms, making purchases, deleting data, posting content, installing or modifying userscripts. For userscripts specifically, show the script's \`@match\` patterns and a summary of what it does before installing — a userscript runs on every matching page after installation and cannot be easily recalled.
- **Confirm before irreversible actions**: submitting forms, making purchases, deleting data, posting content, installing or modifying userscripts. For userscripts specifically, show the script's \`@match\` patterns and a summary of what it does before installing — a userscript runs on every matching page after installation and cannot be easily recalled. If the executor cannot ask the user directly, it must stop and report the \`@match\` patterns plus the summary back to the parent agent for confirmation instead of proceeding.

Copilot uses AI. Check for mistakes.
Comment on lines 17 to +20
3. **User Messages**
- List ALL user messages that are not tool results
- These are critical for understanding the user's feedback and changing intent
- Include any mid-conversation corrections or preference changes
- List ALL user messages that are not tool results, in order
- **Mid-task corrections are highest priority** — if the user interrupted an ongoing operation with a correction (e.g. "stop", "do it differently", "that's wrong"), record these verbatim. These messages are the most commonly lost in long conversations and the most damaging to skip: a resumed agent will repeat the exact mistake that was already corrected.
- Include preference changes, clarifications, and any instruction that overrides an earlier one
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildCompactUserPrompt 的 Section 3 文本发生了语义增强(强调 mid-task corrections 最高优先级、要求逐字记录),但现有 compact_prompt.test.ts 只断言包含 8 个段落标题,并未覆盖这条关键约束。建议在对应测试文件中补充断言(例如包含 “Mid-task corrections are highest priority” / “record these verbatim” 等关键短语),避免后续 prompt 回归。

Copilot uses AI. Check for mistakes.
@CodFrm
Copy link
Copy Markdown
Member

CodFrm commented Apr 23, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@CodFrm CodFrm merged commit 6f97ba3 into release/v1.4-agent Apr 23, 2026
8 checks passed
@CodFrm CodFrm deleted the v1.4-agent-prompt-change-01 branch April 23, 2026 06:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants